Skip to content

Conversation

@romseygeek
Copy link
Contributor

@romseygeek romseygeek commented Nov 3, 2025

This adds a competitive iterator implementation that will take advantage of doc
value skippers in the case that:

  • the index is sorted by a low cardinality field like hostname, and then by a high cardinality
    field like timestamp
  • skippers are enabled on both of these fields
  • the query is sorted by the high cardinality field.

To be able to plug this new implementation into the lucene sort architecture, we need
to fork NumericComparator and some associated classes. LongValuesComparatorSource
now returns the forked version with the new competitive iterator builder.

romseygeek and others added 13 commits October 29, 2025 09:45
… with the recorded bottom.

When construction a new CompetitiveDISIBuilder, then check whether global min/max points or global min/max doc values skipper are comparative with the bottom.
If so, then update competitiveIterator with an empty iterator, because no documents will have a value that is competitive with the current recorded bottom in the current segment.

Doing this at CompetitiveDISIBuilder construction is cheap and allows to immediately prune, instead of waiting until doUpdateCompetitiveIterator(...) is invoked.
@romseygeek romseygeek self-assigned this Nov 3, 2025
@romseygeek romseygeek added >enhancement :Search/Search Search-related issues that do not fall into other categories v9.3.0 labels Nov 3, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Nov 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @romseygeek, I've created a changelog YAML for you.

@romseygeek
Copy link
Contributor Author

Buildkite benchmark this with elastic/logs please

@romseygeek
Copy link
Contributor Author

Buildkite benchmark this with elastic-logs please

@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 6, 2025

💚 Build Succeeded

This build ran two elastic-logs benchmarks to evaluate performance impact of this PR.

History

cc @romseygeek

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👍 . I left a few comments, but otherwise LGTM.

* range, using DocValueSkippers on the primary sort field to advance rapidly
* to the next block of values.
*/
class SecondarySortIterator extends DocIdSetIterator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a unit test for this iterator? Maybe we can do a test duel against DocValuesRangeIterator wrapped in an interator?

}
DocValuesSkipper skipper = context.reader().getDocValuesSkipper(field);
DocValuesSkipper primaryFieldSkipper = context.reader().getDocValuesSkipper(sortFields[0].getField());
if (primaryFieldSkipper == null || skipper.docCount() != maxDoc || primaryFieldSkipper.docCount() != maxDoc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case primaryFieldSkipper is null, then the secondary sort field can be treated as primary sort and we can go faster (like in SortedNumericDocValuesRangeQuery#getDocIdSetIteratorOrNullForPrimarySort). But I don't think this happens now? Because super.buildCompetitiveDISIBuilder(context); will not detect this?

The same applies if primary sort field has just one value.

If this is true. Let's address this then in a followup?

competitiveDISIBuilder = buildCompetitiveDISIBuilder(context);
}

protected CompetitiveDISIBuilder buildCompetitiveDISIBuilder(LeafReaderContext context) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit we want to contribute to upstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@romseygeek romseygeek enabled auto-merge (squash) November 9, 2025 15:38
@romseygeek romseygeek disabled auto-merge November 9, 2025 15:39
@romseygeek romseygeek merged commit 2236cbf into elastic:main Nov 9, 2025
33 of 34 checks passed
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Nov 10, 2025
This adds a competitive iterator implementation that will take advantage of doc
value skippers in the case that:
* the index is sorted by a low cardinality field like hostname, and then by a high cardinality
field like timestamp
* skippers are enabled on both of these fields
* the query is sorted by the high cardinality field.

To be able to plug this new implementation into the lucene sort architecture, we need
to fork NumericComparator and some associated classes. LongValuesComparatorSource
now returns the forked version with the new competitive iterator builder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants